The first two chunks of this r markdown file after the r setup allow for plot zooming, but it also means that the html file must be opened in a browser to view the document properly. When it knits in RStudio the preview will appear empty but the html when opened in a browser will have all the info and you can click on each plot to Zoom in on it.
If you have question please email the most recent author, currently
Marissa A. Dyck
Postdoctoral research fellow
University of Victoria
School of Environmental Studies
Email: marissadyck17@gmail.com
(update/add authors as needed)
Before starting you should ensure you have the latest version of R and RStudio downloaded. This code was generated under R version 4.2.3 and with RStudio version 2024.04.2+764.
You can download R and RStudio HERE
This script is written in R markdown and thus uses a mix of coding markup languages and R. If you are planning to run this script with new data or make any modifications you will want to be familiar with some basics of R markdown.
Below is an R markdown cheatsheet to help you get started,
R
markdown cheatsheet
If you don’t already have the following packages installed, use the code below to install them. *NOTE this will not run automatically as eval=FALSE is included in the chunk setup (i.e. I don’t want it to run every time I run this code since I have the packages installed)
install.packages('tidyverse')
install.packages('PerformanceAnalytics')
install.packages('Hmisc')
Then load the packages to your library.
library(tidyverse) # data tidying, visualization, and much more; this will load all tidyverse packages, can see complete list using tidyverse_packages()
library(PerformanceAnalytics) #Used to generate a correlation plot
library(Hmisc) # used to generate histograms for all variables in data frame
We have three data files that represent possible covariates for the analysis and we will import all of them at once here.
SRFN_HFI.csv which contains human footprint inventory (anthropogenic disturbances) on the landscape from ABMI’s Wall-to-Wall Human Footprint Inventory - Year 2021
SRFN_landscape.csv which contains landcover inventory (landcover types) on the landscape from ABMI’s Wall-to-Wall landcover Inventory - Year 2010
SRFN_harvest.csv which contains proportional harvest per year? from the same source as the HFI data, but we extracted this after-the-fact to get info on the years harvested which wasn’t in our original download so we will have to add it back to the data
# these data files have a similar format so we can read them in together using the map() function in the purrr package
srfn_covariate_data <-
# provide file path (e.g. folders to find the data)
file.path('data/raw',
# provide the file names
c('SRFN_HFI.csv',
'SRFN_landcover.csv',
'SRFN_harvest.csv')) %>%
# use purrr map to read in files, the ~.x is a placeholder that refers to the object before the last pipe (aka the list of data we are reading in) so all functions inside the map() after ~.x will be performed on all the objects in the list we provided
map(~.x %>%
read_csv(.,
# specify how to read in the various columns
col_types = cols(Site = col_factor(),
BUFF_DIST = col_integer(),
.default = col_number())) %>%
# rename site column to site_number fo accuracy and joining data later
rename(site_number = Site) %>%
# set the column names to lower case which makes it easier to reference them later so we don't have to type in all caps
set_names(
names(.) %>%
tolower()) %>%
# Reorder columns: site_number, buff_dist, then the rest alphabetically
select(site_number, buff_dist, sort(setdiff(names(.), c('site_number', 'buff_dist'))))) %>%
# set the names of the two files in the list, if you don't run this they will be named numerically (e.g. [1], [2]) which can get confusing
purrr::set_names('HFI',
'VEG',
'harvest')
What we did above is create a list which contains three elements, the three dataframes we just read in, we did a bit of data tidying and then named each element (HFI, VEG, and harvest) so we can easily reference them from the list later
Even though we set some of the columns to read in as a specific type in the data import step it’s always a good idea to check internal structure.
str(srfn_covariate_data)
## List of 3
## $ HFI : tibble [1,200 × 77] (S3: tbl_df/tbl/data.frame)
## ..$ site_number : Factor w/ 60 levels "1","2","4","6",..: 1 2 3 4 5 6 7 8 9 10 ...
## ..$ buff_dist : int [1:1200] 250 250 250 250 250 250 250 250 250 250 ...
## ..$ airp-runway : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ borrowpit-dry : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ borrowpit-wet : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ borrowpits : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ buffer_area : num [1:1200] 196260 196260 196260 196260 196260 ...
## ..$ camp-industrial : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ campground : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ canal : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ cfo : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ clearing-unknown : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ clearing-wellpad-unconfirmed: num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ conventional-seismic : num [1:1200] 0.00 5.41e-05 0.00 0.00 0.00 ...
## ..$ country-residence : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ crop : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ cultivation_abandoned : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ dugout : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ facility-other : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ facility-unknown : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ fruit-vegetables : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ golfcourse : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ greenspace : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ grvl-sand-pit : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ harvest-area : num [1:1200] 0.432 0.342 0 0.388 0.424 ...
## ..$ harvest-area-white-zone : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ lagoon : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ landfill : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ low-impact-seismic : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ mill : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ mines-pitlake : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ misc-oil-gas-facility : num [1:1200] 0 0.131 0 0 0 ...
## ..$ oil-gas-plant : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ open-pit-mine : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ pipeline : num [1:1200] 0 0.148 0.0148 0 0 ...
## ..$ recreation : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ reservoir : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ residence_clearing : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ rlwy-mlt-track : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ rlwy-sgl-track : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ rlwy-spur : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road-gravel-1l : num [1:1200] 0.00 5.99e-02 7.05e-03 7.11e-06 0.00 ...
## ..$ road-gravel-2l : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road-paved-1l : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road-paved-2l : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road-paved-3l : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road-paved-4l : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road-paved-div : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road-paved-undiv-1l : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road-paved-undiv-2l : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road-unclassified : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road-unimproved : num [1:1200] 0 0 0 0 0.00675 ...
## ..$ road-unpaved-2l : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road-winter : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ rough_pasture : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ runway : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ rural-residence : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ sump : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ surrounding-veg : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ tame_pasture : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ trail : num [1:1200] 0 0 0.011 0 0 ...
## ..$ transfer_station : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ transmission-line : num [1:1200] 0 0 0 0 0 ...
## ..$ truck-trail : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ urban-industrial : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ urban-residence : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ vegetated-edge-railways : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ vegetated-edge-roads : num [1:1200] 0 0.09955 0.0129 0.00112 0.01425 ...
## ..$ well_cleared_not_confirmed : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well_cleared_not_drilled : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well-aband : num [1:1200] 0 0 0 0 0 ...
## ..$ well-bitumen : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well-cased : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well-gas : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well-oil : num [1:1200] 0 0 0 0 0.0332 ...
## ..$ well-other : num [1:1200] 0 0 0.0183 0.0318 0 ...
## ..$ well-unknown : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## $ VEG : tibble [1,200 × 11] (S3: tbl_df/tbl/data.frame)
## ..$ site_number: Factor w/ 60 levels "1","2","4","6",..: 1 2 3 4 5 6 7 8 9 10 ...
## ..$ buff_dist : int [1:1200] 250 250 250 250 250 250 250 250 250 250 ...
## ..$ 110 : num [1:1200] 0 0.3608 0.0618 0 0 ...
## ..$ 120 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 20 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 210 : num [1:1200] 0.847 0 0.743 0.442 0.284 ...
## ..$ 220 : num [1:1200] 0 0.18 0 0 0 ...
## ..$ 230 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 33 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 34 : num [1:1200] 0 0.4514 0.0716 0.00837 0.04522 ...
## ..$ 50 : num [1:1200] 0.15301 0.00776 0.12401 0.54941 0.6703 ...
## $ harvest: tibble [1,200 × 63] (S3: tbl_df/tbl/data.frame)
## ..$ site_number : Factor w/ 60 levels "1","2","4","6",..: 1 2 3 4 5 6 7 8 9 10 ...
## ..$ buff_dist : int [1:1200] 250 250 250 250 250 250 250 250 250 250 ...
## ..$ 1940 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1950 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1960 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1966 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1967 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1968 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1969 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1970 : num [1:1200] 0 0.342 0 0.209 0 ...
## ..$ 1971 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1972 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1973 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1974 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1975 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1976 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1977 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1978 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1979 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1980 : num [1:1200] 0 0 0 0 0 ...
## ..$ 1981 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1982 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1983 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1984 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1985 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1986 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1987 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1988 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1989 : num [1:1200] 0.0285 0 0 0 0 ...
## ..$ 1990 : num [1:1200] 0 0 0 0 0 ...
## ..$ 1991 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1992 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1993 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1994 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1995 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1996 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1997 : num [1:1200] 0.0478 0 0 0 0 ...
## ..$ 1998 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1999 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2000 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2001 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2002 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2003 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2004 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2005 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2006 : num [1:1200] 0 0 0 0.179 0.424 ...
## ..$ 2007 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2008 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2009 : num [1:1200] 0 0 0 0 0 ...
## ..$ 2010 : num [1:1200] 0.355 0 0 0 0 ...
## ..$ 2011 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2012 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2013 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2014 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2015 : num [1:1200] 0 0 0 0 0 ...
## ..$ 2016 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2017 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2018 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2019 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2020 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2021 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ buffer_area : num [1:1200] 196260 196260 196260 196260 196260 ...
## ..$ feature_area: num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
From a quick glance everything looks good.
Now let’s check that all the sites are accounted for, there should be 53 based on the sites that had gps info in the GrizzleyRidge_camera file
# check that the sites are all there and entered correctly
# since the data sets are in a list we need to call the list first, then the data name in the list, then the column name
levels(srfn_covariate_data$HFI$site_number)
## [1] "1" "2" "4" "6" "10" "12" "13" "17" "18" "21" "23" "24"
## [13] "26" "30" "31" "35" "37" "38" "39" "42" "44" "45" "46" "51"
## [25] "55" "56" "58" "60" "62" "63" "67" "70" "74" "75" "77" "81"
## [37] "83" "88" "94" "95" "99" "100" "104" "105" "107" "110" "117" "118"
## [49] "119" "120" "121" "124" "125" "130" "131" "132" "136" "137" "140" "141"
levels(srfn_covariate_data$VEG$site_number)
## [1] "1" "2" "4" "6" "10" "12" "13" "17" "18" "21" "23" "24"
## [13] "26" "30" "31" "35" "37" "38" "39" "42" "44" "45" "46" "51"
## [25] "55" "56" "58" "60" "62" "63" "67" "70" "74" "75" "77" "81"
## [37] "83" "88" "94" "95" "99" "100" "104" "105" "107" "110" "117" "118"
## [49] "119" "120" "121" "124" "125" "130" "131" "132" "136" "137" "140" "141"
levels(srfn_covariate_data$harvest$site_number)
## [1] "1" "2" "4" "6" "10" "12" "13" "17" "18" "21" "23" "24"
## [13] "26" "30" "31" "35" "37" "38" "39" "42" "44" "45" "46" "51"
## [25] "55" "56" "58" "60" "62" "63" "67" "70" "74" "75" "77" "81"
## [37] "83" "88" "94" "95" "99" "100" "104" "105" "107" "110" "117" "118"
## [49] "119" "120" "121" "124" "125" "130" "131" "132" "136" "137" "140" "141"
We want to make sure the site names match with the camera data so let’s import the timelapse data from the last script (01_ACME_SRFN_camera….) to check
All the sites look like they match up
We should check that the column names all look good, there are a ton for the HFI data frame so we won’t look at each of the features individually but check that the general formatting/naming is okay
names(srfn_covariate_data$HFI)
## [1] "site_number" "buff_dist"
## [3] "airp-runway" "borrowpit-dry"
## [5] "borrowpit-wet" "borrowpits"
## [7] "buffer_area" "camp-industrial"
## [9] "campground" "canal"
## [11] "cfo" "clearing-unknown"
## [13] "clearing-wellpad-unconfirmed" "conventional-seismic"
## [15] "country-residence" "crop"
## [17] "cultivation_abandoned" "dugout"
## [19] "facility-other" "facility-unknown"
## [21] "fruit-vegetables" "golfcourse"
## [23] "greenspace" "grvl-sand-pit"
## [25] "harvest-area" "harvest-area-white-zone"
## [27] "lagoon" "landfill"
## [29] "low-impact-seismic" "mill"
## [31] "mines-pitlake" "misc-oil-gas-facility"
## [33] "oil-gas-plant" "open-pit-mine"
## [35] "pipeline" "recreation"
## [37] "reservoir" "residence_clearing"
## [39] "rlwy-mlt-track" "rlwy-sgl-track"
## [41] "rlwy-spur" "road-gravel-1l"
## [43] "road-gravel-2l" "road-paved-1l"
## [45] "road-paved-2l" "road-paved-3l"
## [47] "road-paved-4l" "road-paved-div"
## [49] "road-paved-undiv-1l" "road-paved-undiv-2l"
## [51] "road-unclassified" "road-unimproved"
## [53] "road-unpaved-2l" "road-winter"
## [55] "rough_pasture" "runway"
## [57] "rural-residence" "sump"
## [59] "surrounding-veg" "tame_pasture"
## [61] "trail" "transfer_station"
## [63] "transmission-line" "truck-trail"
## [65] "urban-industrial" "urban-residence"
## [67] "vegetated-edge-railways" "vegetated-edge-roads"
## [69] "well_cleared_not_confirmed" "well_cleared_not_drilled"
## [71] "well-aband" "well-bitumen"
## [73] "well-cased" "well-gas"
## [75] "well-oil" "well-other"
## [77] "well-unknown"
These look okay but we should replace the dash ‘-’ with and underscore ‘_’ to match formatting of other files and because it’s easier for R to work with. We will do this in a later step with any other issues because we don’t need it fixed now
We also want to add array and camera columns which we can do using the site data.
Let’s check the VEG data too
names(srfn_covariate_data$VEG)
## [1] "site_number" "buff_dist" "110" "120" "20"
## [6] "210" "220" "230" "33" "34"
## [11] "50"
And finally the harvest data
names(srfn_covariate_data$harvest)
## [1] "site_number" "buff_dist" "1940" "1950" "1960"
## [6] "1966" "1967" "1968" "1969" "1970"
## [11] "1971" "1972" "1973" "1974" "1975"
## [16] "1976" "1977" "1978" "1979" "1980"
## [21] "1981" "1982" "1983" "1984" "1985"
## [26] "1986" "1987" "1988" "1989" "1990"
## [31] "1991" "1992" "1993" "1994" "1995"
## [36] "1996" "1997" "1998" "1999" "2000"
## [41] "2001" "2002" "2003" "2004" "2005"
## [46] "2006" "2007" "2008" "2009" "2010"
## [51] "2011" "2012" "2013" "2014" "2015"
## [56] "2016" "2017" "2018" "2019" "2020"
## [61] "2021" "buffer_area" "feature_area"
Let’s check the summary for any NAs that shouldn’t be in the data, mostly we are looking for NAs in the site_number or buff_dist columns
summary(srfn_covariate_data$HFI)
## site_number buff_dist airp-runway borrowpit-dry
## 1 : 20 Min. : 250 Min. :0 Min. :0.0000000
## 2 : 20 1st Qu.:1438 1st Qu.:0 1st Qu.:0.0000000
## 4 : 20 Median :2625 Median :0 Median :0.0000000
## 6 : 20 Mean :2625 Mean :0 Mean :0.0004945
## 10 : 20 3rd Qu.:3812 3rd Qu.:0 3rd Qu.:0.0005115
## 12 : 20 Max. :5000 Max. :0 Max. :0.0296372
## (Other):1080
## borrowpit-wet borrowpits buffer_area camp-industrial
## Min. :0.0000000 Min. :0.000e+00 Min. : 196260 Min. :0
## 1st Qu.:0.0000000 1st Qu.:0.000e+00 1st Qu.: 6525640 1st Qu.:0
## Median :0.0000000 Median :0.000e+00 Median :21686712 Median :0
## Mean :0.0001462 Mean :4.598e-05 Mean :28163286 Mean :0
## 3rd Qu.:0.0000760 3rd Qu.:0.000e+00 3rd Qu.:45679477 3rd Qu.:0
## Max. :0.0073210 Max. :2.615e-03 Max. :78503934 Max. :0
##
## campground canal cfo clearing-unknown
## Min. :0.000e+00 Min. :0.0000000 Min. :0 Min. :0.000e+00
## 1st Qu.:0.000e+00 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0.000e+00
## Median :0.000e+00 Median :0.0000000 Median :0 Median :4.240e-07
## Mean :2.409e-06 Mean :0.0002124 Mean :0 Mean :8.076e-04
## 3rd Qu.:0.000e+00 3rd Qu.:0.0000000 3rd Qu.:0 3rd Qu.:1.124e-03
## Max. :4.967e-04 Max. :0.0076994 Max. :0 Max. :2.818e-02
##
## clearing-wellpad-unconfirmed conventional-seismic country-residence
## Min. :0.0000000 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.0000000 1st Qu.:0.001827 1st Qu.:0.000000
## Median :0.0000000 Median :0.003612 Median :0.000000
## Mean :0.0001149 Mean :0.004028 Mean :0.000439
## 3rd Qu.:0.0000000 3rd Qu.:0.005451 3rd Qu.:0.000000
## Max. :0.0027152 Max. :0.030028 Max. :0.056385
##
## crop cultivation_abandoned dugout
## Min. :0.00000 Min. :0.000000 Min. :0.000e+00
## 1st Qu.:0.00000 1st Qu.:0.000000 1st Qu.:0.000e+00
## Median :0.00000 Median :0.000000 Median :0.000e+00
## Mean :0.02988 Mean :0.001701 Mean :2.309e-05
## 3rd Qu.:0.00000 3rd Qu.:0.000000 3rd Qu.:0.000e+00
## Max. :0.43283 Max. :0.040084 Max. :1.239e-03
##
## facility-other facility-unknown fruit-vegetables golfcourse
## Min. :0.0000000 Min. :0.0000000 Min. :0 Min. :0
## 1st Qu.:0.0000000 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0
## Median :0.0000000 Median :0.0000000 Median :0 Median :0
## Mean :0.0003137 Mean :0.0000223 Mean :0 Mean :0
## 3rd Qu.:0.0000000 3rd Qu.:0.0000000 3rd Qu.:0 3rd Qu.:0
## Max. :0.0774805 Max. :0.0064178 Max. :0 Max. :0
##
## greenspace grvl-sand-pit harvest-area
## Min. :0.000e+00 Min. :0.000000 Min. :0.00000
## 1st Qu.:0.000e+00 1st Qu.:0.000000 1st Qu.:0.02588
## Median :0.000e+00 Median :0.000000 Median :0.23866
## Mean :1.424e-05 Mean :0.001116 Mean :0.23873
## 3rd Qu.:0.000e+00 3rd Qu.:0.000000 3rd Qu.:0.37536
## Max. :2.346e-03 Max. :0.416663 Max. :0.98631
##
## harvest-area-white-zone lagoon landfill low-impact-seismic
## Min. :0.00000 Min. :0.000e+00 Min. :0 Min. :0.000e+00
## 1st Qu.:0.00000 1st Qu.:0.000e+00 1st Qu.:0 1st Qu.:0.000e+00
## Median :0.00000 Median :0.000e+00 Median :0 Median :0.000e+00
## Mean :0.01302 Mean :3.106e-05 Mean :0 Mean :1.828e-05
## 3rd Qu.:0.00000 3rd Qu.:0.000e+00 3rd Qu.:0 3rd Qu.:0.000e+00
## Max. :0.80503 Max. :4.257e-03 Max. :0 Max. :6.059e-03
##
## mill mines-pitlake misc-oil-gas-facility oil-gas-plant open-pit-mine
## Min. :0 Min. :0 Min. :0.0000000 Min. :0 Min. :0
## 1st Qu.:0 1st Qu.:0 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0
## Median :0 Median :0 Median :0.0000000 Median :0 Median :0
## Mean :0 Mean :0 Mean :0.0013619 Mean :0 Mean :0
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.0007224 3rd Qu.:0 3rd Qu.:0
## Max. :0 Max. :0 Max. :0.1313891 Max. :0 Max. :0
##
## pipeline recreation reservoir residence_clearing
## Min. :0.00000 Min. :0.000e+00 Min. :0.000e+00 Min. :0.0000000
## 1st Qu.:0.00000 1st Qu.:0.000e+00 1st Qu.:0.000e+00 1st Qu.:0.0000000
## Median :0.00450 Median :0.000e+00 Median :0.000e+00 Median :0.0000000
## Mean :0.01031 Mean :6.623e-05 Mean :8.539e-05 Mean :0.0001049
## 3rd Qu.:0.01523 3rd Qu.:0.000e+00 3rd Qu.:0.000e+00 3rd Qu.:0.0000000
## Max. :0.14867 Max. :7.941e-03 Max. :1.393e-02 Max. :0.0132461
##
## rlwy-mlt-track rlwy-sgl-track rlwy-spur road-gravel-1l
## Min. :0 Min. :0.0000000 Min. :0 Min. :0.0000000
## 1st Qu.:0 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0.0006477
## Median :0 Median :0.0000000 Median :0 Median :0.0043887
## Mean :0 Mean :0.0001036 Mean :0 Mean :0.0056608
## 3rd Qu.:0 3rd Qu.:0.0000000 3rd Qu.:0 3rd Qu.:0.0088573
## Max. :0 Max. :0.0036376 Max. :0 Max. :0.0598752
##
## road-gravel-2l road-paved-1l road-paved-2l road-paved-3l
## Min. :0.000e+00 Min. :0.000e+00 Min. :0 Min. :0
## 1st Qu.:0.000e+00 1st Qu.:0.000e+00 1st Qu.:0 1st Qu.:0
## Median :0.000e+00 Median :0.000e+00 Median :0 Median :0
## Mean :3.886e-05 Mean :5.918e-06 Mean :0 Mean :0
## 3rd Qu.:0.000e+00 3rd Qu.:0.000e+00 3rd Qu.:0 3rd Qu.:0
## Max. :1.820e-03 Max. :6.158e-04 Max. :0 Max. :0
##
## road-paved-4l road-paved-div road-paved-undiv-1l road-paved-undiv-2l
## Min. :0 Min. :0 Min. :0.000e+00 Min. :0.0000000
## 1st Qu.:0 1st Qu.:0 1st Qu.:0.000e+00 1st Qu.:0.0000000
## Median :0 Median :0 Median :0.000e+00 Median :0.0000000
## Mean :0 Mean :0 Mean :7.538e-06 Mean :0.0005671
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.000e+00 3rd Qu.:0.0000000
## Max. :0 Max. :0 Max. :1.051e-03 Max. :0.0066563
##
## road-unclassified road-unimproved road-unpaved-2l road-winter
## Min. :0.0000000 Min. :0.0000000 Min. :0 Min. :0
## 1st Qu.:0.0000000 1st Qu.:0.0001997 1st Qu.:0 1st Qu.:0
## Median :0.0000000 Median :0.0009214 Median :0 Median :0
## Mean :0.0001274 Mean :0.0011036 Mean :0 Mean :0
## 3rd Qu.:0.0000000 3rd Qu.:0.0014730 3rd Qu.:0 3rd Qu.:0
## Max. :0.0145510 Max. :0.0237365 Max. :0 Max. :0
##
## rough_pasture runway rural-residence sump
## Min. :0.00000 Min. :0.0000000 Min. :0.000000 Min. :0.000e+00
## 1st Qu.:0.00000 1st Qu.:0.0000000 1st Qu.:0.000000 1st Qu.:0.000e+00
## Median :0.00000 Median :0.0000000 Median :0.000000 Median :0.000e+00
## Mean :0.01066 Mean :0.0001223 Mean :0.001884 Mean :4.982e-05
## 3rd Qu.:0.00000 3rd Qu.:0.0000000 3rd Qu.:0.000000 3rd Qu.:0.000e+00
## Max. :0.28616 Max. :0.0123446 Max. :0.091914 Max. :3.232e-03
##
## surrounding-veg tame_pasture trail transfer_station
## Min. :0.0000000 Min. :0.0000 Min. :0.0000000 Min. :0
## 1st Qu.:0.0000000 1st Qu.:0.0000 1st Qu.:0.0003476 1st Qu.:0
## Median :0.0000000 Median :0.0000 Median :0.0009790 Median :0
## Mean :0.0001282 Mean :0.0146 Mean :0.0012570 Mean :0
## 3rd Qu.:0.0000000 3rd Qu.:0.0000 3rd Qu.:0.0018804 3rd Qu.:0
## Max. :0.0346612 Max. :0.2991 Max. :0.0118693 Max. :0
##
## transmission-line truck-trail urban-industrial
## Min. :0.0000000 Min. :0.0000000 Min. :0.000e+00
## 1st Qu.:0.0000000 1st Qu.:0.0001198 1st Qu.:0.000e+00
## Median :0.0000000 Median :0.0006074 Median :0.000e+00
## Mean :0.0011787 Mean :0.0011931 Mean :2.583e-05
## 3rd Qu.:0.0003164 3rd Qu.:0.0015813 3rd Qu.:0.000e+00
## Max. :0.0460439 Max. :0.0823490 Max. :4.045e-03
##
## urban-residence vegetated-edge-railways vegetated-edge-roads
## Min. :0.0000000 Min. :0.0000000 Min. :0.000000
## 1st Qu.:0.0000000 1st Qu.:0.0000000 1st Qu.:0.003433
## Median :0.0000000 Median :0.0000000 Median :0.012251
## Mean :0.0001453 Mean :0.0001874 Mean :0.013592
## 3rd Qu.:0.0000000 3rd Qu.:0.0000000 3rd Qu.:0.020822
## Max. :0.0191791 Max. :0.0049635 Max. :0.099551
##
## well_cleared_not_confirmed well_cleared_not_drilled well-aband
## Min. :0 Min. :0 Min. :0.0000000
## 1st Qu.:0 1st Qu.:0 1st Qu.:0.0001567
## Median :0 Median :0 Median :0.0019988
## Mean :0 Mean :0 Mean :0.0031103
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.0045361
## Max. :0 Max. :0 Max. :0.0437908
##
## well-bitumen well-cased well-gas well-oil
## Min. :0 Min. :0.000e+00 Min. :0.000e+00 Min. :0.000000
## 1st Qu.:0 1st Qu.:0.000e+00 1st Qu.:0.000e+00 1st Qu.:0.000000
## Median :0 Median :0.000e+00 Median :0.000e+00 Median :0.003253
## Mean :0 Mean :7.166e-05 Mean :5.196e-05 Mean :0.005327
## 3rd Qu.:0 3rd Qu.:0.000e+00 3rd Qu.:0.000e+00 3rd Qu.:0.009214
## Max. :0 Max. :3.125e-03 Max. :1.857e-03 Max. :0.095784
##
## well-other well-unknown
## Min. :0.000000 Min. :0
## 1st Qu.:0.000000 1st Qu.:0
## Median :0.000000 Median :0
## Mean :0.001677 Mean :0
## 3rd Qu.:0.002377 3rd Qu.:0
## Max. :0.032438 Max. :0
##
summary(srfn_covariate_data$VEG)
## site_number buff_dist 110 120
## 1 : 20 Min. : 250 Min. :0.000000 Min. :0.00000
## 2 : 20 1st Qu.:1438 1st Qu.:0.006635 1st Qu.:0.00000
## 4 : 20 Median :2625 Median :0.034291 Median :0.00000
## 6 : 20 Mean :2625 Mean :0.055123 Mean :0.03587
## 10 : 20 3rd Qu.:3812 3rd Qu.:0.068804 3rd Qu.:0.00000
## 12 : 20 Max. :5000 Max. :0.883334 Max. :0.49000
## (Other):1080
## 20 210 220 230
## Min. :0.00000 Min. :0.00000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.03179 1st Qu.:0.1463 1st Qu.:0.00000
## Median :0.00000 Median :0.23137 Median :0.3010 Median :0.01965
## Mean :0.06146 Mean :0.23902 Mean :0.3502 Mean :0.04277
## 3rd Qu.:0.03622 3rd Qu.:0.38303 3rd Qu.:0.5250 3rd Qu.:0.06350
## Max. :0.84113 Max. :0.84699 Max. :1.0000 Max. :0.93137
##
## 33 34 50
## Min. :0.000e+00 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.000e+00 1st Qu.:0.01782 1st Qu.:0.04624
## Median :0.000e+00 Median :0.05463 Median :0.10172
## Mean :4.182e-05 Mean :0.05948 Mean :0.15602
## 3rd Qu.:0.000e+00 3rd Qu.:0.08856 3rd Qu.:0.20080
## Max. :3.641e-03 Max. :0.45140 Max. :0.93212
##
summary(srfn_covariate_data$harvest)
## site_number buff_dist 1940 1950
## 1 : 20 Min. : 250 Min. :0.0000000 Min. :0.000000
## 2 : 20 1st Qu.:1438 1st Qu.:0.0000000 1st Qu.:0.000000
## 4 : 20 Median :2625 Median :0.0000000 Median :0.000000
## 6 : 20 Mean :2625 Mean :0.0002878 Mean :0.005077
## 10 : 20 3rd Qu.:3812 3rd Qu.:0.0000000 3rd Qu.:0.000000
## 12 : 20 Max. :5000 Max. :0.0243877 Max. :0.891286
## (Other):1080
## 1960 1966 1967 1968
## Min. :0.000000 Min. :0 Min. :0.0000000 Min. :0.0000000
## 1st Qu.:0.000000 1st Qu.:0 1st Qu.:0.0000000 1st Qu.:0.0000000
## Median :0.000000 Median :0 Median :0.0000000 Median :0.0000000
## Mean :0.004843 Mean :0 Mean :0.0001106 Mean :0.0001209
## 3rd Qu.:0.001549 3rd Qu.:0 3rd Qu.:0.0000000 3rd Qu.:0.0000000
## Max. :0.125229 Max. :0 Max. :0.0150943 Max. :0.0135532
##
## 1969 1970 1971 1972 1973
## Min. :0.000e+00 Min. :0.00000 Min. :0 Min. :0 Min. :0
## 1st Qu.:0.000e+00 1st Qu.:0.00000 1st Qu.:0 1st Qu.:0 1st Qu.:0
## Median :0.000e+00 Median :0.00000 Median :0 Median :0 Median :0
## Mean :1.067e-05 Mean :0.02033 Mean :0 Mean :0 Mean :0
## 3rd Qu.:0.000e+00 3rd Qu.:0.01921 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0
## Max. :1.691e-03 Max. :0.87957 Max. :0 Max. :0 Max. :0
##
## 1974 1975 1976 1977
## Min. :0 Min. :0.000e+00 Min. :0.0000000 Min. :0.000e+00
## 1st Qu.:0 1st Qu.:0.000e+00 1st Qu.:0.0000000 1st Qu.:0.000e+00
## Median :0 Median :0.000e+00 Median :0.0000000 Median :0.000e+00
## Mean :0 Mean :5.914e-06 Mean :0.0000021 Mean :3.272e-07
## 3rd Qu.:0 3rd Qu.:0.000e+00 3rd Qu.:0.0000000 3rd Qu.:0.000e+00
## Max. :0 Max. :1.430e-03 Max. :0.0007532 Max. :2.599e-04
##
## 1978 1979 1980 1981 1982
## Min. :0 Min. :0 Min. :0.000000 Min. :0 Min. :0
## 1st Qu.:0 1st Qu.:0 1st Qu.:0.000000 1st Qu.:0 1st Qu.:0
## Median :0 Median :0 Median :0.000000 Median :0 Median :0
## Mean :0 Mean :0 Mean :0.017130 Mean :0 Mean :0
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.003411 3rd Qu.:0 3rd Qu.:0
## Max. :0 Max. :0 Max. :0.420122 Max. :0 Max. :0
##
## 1983 1984 1985 1986
## Min. :0.000000 Min. :0.000e+00 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.000000 1st Qu.:0.000e+00 1st Qu.:0.000000 1st Qu.:0.000000
## Median :0.000000 Median :0.000e+00 Median :0.000000 Median :0.000000
## Mean :0.000197 Mean :5.556e-05 Mean :0.001827 Mean :0.007432
## 3rd Qu.:0.000000 3rd Qu.:0.000e+00 3rd Qu.:0.000000 3rd Qu.:0.000000
## Max. :0.011415 Max. :7.692e-03 Max. :0.087543 Max. :0.197918
##
## 1987 1988 1989 1990
## Min. :0.0000000 Min. :0.000000 Min. :0.000000 Min. :0.00000
## 1st Qu.:0.0000000 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.00000
## Median :0.0000000 Median :0.000000 Median :0.000000 Median :0.00000
## Mean :0.0003432 Mean :0.001416 Mean :0.002745 Mean :0.02199
## 3rd Qu.:0.0000000 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.00939
## Max. :0.0449292 Max. :0.171834 Max. :0.173129 Max. :0.84354
##
## 1991 1992 1993 1994
## Min. :0 Min. :0 Min. :0.0000000 Min. :0.0000000
## 1st Qu.:0 1st Qu.:0 1st Qu.:0.0000000 1st Qu.:0.0000000
## Median :0 Median :0 Median :0.0000000 Median :0.0000000
## Mean :0 Mean :0 Mean :0.0002048 Mean :0.0007679
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.0000000 3rd Qu.:0.0000000
## Max. :0 Max. :0 Max. :0.0205565 Max. :0.0779967
##
## 1995 1996 1997 1998
## Min. :0.000e+00 Min. :0.000000 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.000e+00 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.000000
## Median :0.000e+00 Median :0.000000 Median :0.000000 Median :0.000000
## Mean :6.971e-05 Mean :0.007337 Mean :0.001736 Mean :0.001915
## 3rd Qu.:0.000e+00 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.000000
## Max. :6.484e-03 Max. :0.788790 Max. :0.126973 Max. :0.108919
##
## 1999 2000 2001 2002
## Min. :0.0000000 Min. :0.000000 Min. :0.000e+00 Min. :0
## 1st Qu.:0.0000000 1st Qu.:0.000000 1st Qu.:0.000e+00 1st Qu.:0
## Median :0.0000000 Median :0.000000 Median :0.000e+00 Median :0
## Mean :0.0004213 Mean :0.007223 Mean :7.072e-05 Mean :0
## 3rd Qu.:0.0000000 3rd Qu.:0.000000 3rd Qu.:0.000e+00 3rd Qu.:0
## Max. :0.0388934 Max. :0.393858 Max. :8.372e-03 Max. :0
##
## 2003 2004 2005 2006
## Min. :0.000000 Min. :0.0000000 Min. :0.000000 Min. :0.00000
## 1st Qu.:0.000000 1st Qu.:0.0000000 1st Qu.:0.000000 1st Qu.:0.00000
## Median :0.000000 Median :0.0000000 Median :0.000000 Median :0.00000
## Mean :0.008926 Mean :0.0043836 Mean :0.002526 Mean :0.01975
## 3rd Qu.:0.000000 3rd Qu.:0.0001052 3rd Qu.:0.000000 3rd Qu.:0.01886
## Max. :0.280990 Max. :0.0906410 Max. :0.244374 Max. :0.42386
##
## 2007 2008 2009 2010
## Min. :0.0000000 Min. :0.00000 Min. :0.00000 Min. :0.000000
## 1st Qu.:0.0000000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.000000
## Median :0.0000000 Median :0.00000 Median :0.00000 Median :0.000000
## Mean :0.0003268 Mean :0.00844 Mean :0.01501 Mean :0.009539
## 3rd Qu.:0.0000000 3rd Qu.:0.00000 3rd Qu.:0.01630 3rd Qu.:0.000000
## Max. :0.0326652 Max. :0.49764 Max. :0.37049 Max. :0.478107
##
## 2011 2012 2013 2014
## Min. :0.000000 Min. :0.000000 Min. :0.000000 Min. :0.000e+00
## 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.000e+00
## Median :0.000000 Median :0.000000 Median :0.000000 Median :0.000e+00
## Mean :0.008219 Mean :0.002717 Mean :0.004286 Mean :8.902e-05
## 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.000e+00
## Max. :0.237154 Max. :0.103264 Max. :0.289583 Max. :4.485e-03
##
## 2015 2016 2017 2018
## Min. :0.0000000 Min. :0.000000 Min. :0.00000 Min. :0.000000
## 1st Qu.:0.0000000 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.000000
## Median :0.0000000 Median :0.000000 Median :0.00000 Median :0.000000
## Mean :0.0130128 Mean :0.000608 Mean :0.01409 Mean :0.003066
## 3rd Qu.:0.0006807 3rd Qu.:0.000000 3rd Qu.:0.01322 3rd Qu.:0.000000
## Max. :0.4669166 Max. :0.037603 Max. :0.19362 Max. :0.359185
##
## 2019 2020 2021 buffer_area
## Min. :0.000000 Min. :0.00000 Min. :0.000000 Min. : 196260
## 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.000000 1st Qu.: 6525640
## Median :0.000000 Median :0.00000 Median :0.000000 Median :21686712
## Mean :0.006002 Mean :0.00565 Mean :0.008415 Mean :28163286
## 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.000000 3rd Qu.:45679477
## Max. :0.186836 Max. :0.11843 Max. :0.459053 Max. :78503934
##
## feature_area
## Min. :0
## 1st Qu.:0
## Median :0
## Mean :0
## 3rd Qu.:0
## Max. :0
##
Everything looks good!
As with the previous sections this section will likely change each year but offers a good starting point, and I do all the data manipulation in one code chunk but run each portion individually as I build the chunk to make sure it’s working.
This code will do the following data formatting on all files simultaneously using purrr::map
srfn_covariate_data_fixed <- srfn_covariate_data %>%
map(
~.x %>%
set_names(
names(.) %>%
# replace the '-' with '_' in the feature column names
str_replace_all(pattern = '-', # provide the character pattern to look for (if you don't keep the \\ it won't work)
replacement = '_')))
Now let’s recheck the data, data structure, and the site_numbers with the deployment data, you can run each of these individually or all at once and review each one
# check structure of variables
str(srfn_covariate_data_fixed)
## List of 3
## $ HFI : tibble [1,200 × 77] (S3: tbl_df/tbl/data.frame)
## ..$ site_number : Factor w/ 60 levels "1","2","4","6",..: 1 2 3 4 5 6 7 8 9 10 ...
## ..$ buff_dist : int [1:1200] 250 250 250 250 250 250 250 250 250 250 ...
## ..$ airp_runway : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ borrowpit_dry : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ borrowpit_wet : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ borrowpits : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ buffer_area : num [1:1200] 196260 196260 196260 196260 196260 ...
## ..$ camp_industrial : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ campground : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ canal : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ cfo : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ clearing_unknown : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ clearing_wellpad_unconfirmed: num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ conventional_seismic : num [1:1200] 0.00 5.41e-05 0.00 0.00 0.00 ...
## ..$ country_residence : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ crop : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ cultivation_abandoned : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ dugout : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ facility_other : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ facility_unknown : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ fruit_vegetables : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ golfcourse : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ greenspace : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ grvl_sand_pit : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ harvest_area : num [1:1200] 0.432 0.342 0 0.388 0.424 ...
## ..$ harvest_area_white_zone : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ lagoon : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ landfill : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ low_impact_seismic : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ mill : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ mines_pitlake : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ misc_oil_gas_facility : num [1:1200] 0 0.131 0 0 0 ...
## ..$ oil_gas_plant : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ open_pit_mine : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ pipeline : num [1:1200] 0 0.148 0.0148 0 0 ...
## ..$ recreation : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ reservoir : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ residence_clearing : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ rlwy_mlt_track : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ rlwy_sgl_track : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ rlwy_spur : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_gravel_1l : num [1:1200] 0.00 5.99e-02 7.05e-03 7.11e-06 0.00 ...
## ..$ road_gravel_2l : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_paved_1l : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_paved_2l : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_paved_3l : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_paved_4l : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_paved_div : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_paved_undiv_1l : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_paved_undiv_2l : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_unclassified : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_unimproved : num [1:1200] 0 0 0 0 0.00675 ...
## ..$ road_unpaved_2l : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_winter : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ rough_pasture : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ runway : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ rural_residence : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ sump : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ surrounding_veg : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ tame_pasture : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ trail : num [1:1200] 0 0 0.011 0 0 ...
## ..$ transfer_station : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ transmission_line : num [1:1200] 0 0 0 0 0 ...
## ..$ truck_trail : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ urban_industrial : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ urban_residence : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ vegetated_edge_railways : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ vegetated_edge_roads : num [1:1200] 0 0.09955 0.0129 0.00112 0.01425 ...
## ..$ well_cleared_not_confirmed : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well_cleared_not_drilled : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well_aband : num [1:1200] 0 0 0 0 0 ...
## ..$ well_bitumen : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well_cased : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well_gas : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well_oil : num [1:1200] 0 0 0 0 0.0332 ...
## ..$ well_other : num [1:1200] 0 0 0.0183 0.0318 0 ...
## ..$ well_unknown : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## $ VEG : tibble [1,200 × 11] (S3: tbl_df/tbl/data.frame)
## ..$ site_number: Factor w/ 60 levels "1","2","4","6",..: 1 2 3 4 5 6 7 8 9 10 ...
## ..$ buff_dist : int [1:1200] 250 250 250 250 250 250 250 250 250 250 ...
## ..$ 110 : num [1:1200] 0 0.3608 0.0618 0 0 ...
## ..$ 120 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 20 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 210 : num [1:1200] 0.847 0 0.743 0.442 0.284 ...
## ..$ 220 : num [1:1200] 0 0.18 0 0 0 ...
## ..$ 230 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 33 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 34 : num [1:1200] 0 0.4514 0.0716 0.00837 0.04522 ...
## ..$ 50 : num [1:1200] 0.15301 0.00776 0.12401 0.54941 0.6703 ...
## $ harvest: tibble [1,200 × 63] (S3: tbl_df/tbl/data.frame)
## ..$ site_number : Factor w/ 60 levels "1","2","4","6",..: 1 2 3 4 5 6 7 8 9 10 ...
## ..$ buff_dist : int [1:1200] 250 250 250 250 250 250 250 250 250 250 ...
## ..$ 1940 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1950 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1960 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1966 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1967 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1968 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1969 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1970 : num [1:1200] 0 0.342 0 0.209 0 ...
## ..$ 1971 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1972 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1973 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1974 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1975 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1976 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1977 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1978 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1979 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1980 : num [1:1200] 0 0 0 0 0 ...
## ..$ 1981 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1982 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1983 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1984 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1985 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1986 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1987 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1988 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1989 : num [1:1200] 0.0285 0 0 0 0 ...
## ..$ 1990 : num [1:1200] 0 0 0 0 0 ...
## ..$ 1991 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1992 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1993 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1994 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1995 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1996 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1997 : num [1:1200] 0.0478 0 0 0 0 ...
## ..$ 1998 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 1999 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2000 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2001 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2002 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2003 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2004 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2005 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2006 : num [1:1200] 0 0 0 0.179 0.424 ...
## ..$ 2007 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2008 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2009 : num [1:1200] 0 0 0 0 0 ...
## ..$ 2010 : num [1:1200] 0.355 0 0 0 0 ...
## ..$ 2011 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2012 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2013 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2014 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2015 : num [1:1200] 0 0 0 0 0 ...
## ..$ 2016 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2017 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2018 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2019 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2020 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ 2021 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ buffer_area : num [1:1200] 196260 196260 196260 196260 196260 ...
## ..$ feature_area: num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
# take a look at the column names
names(srfn_covariate_data_fixed$HFI)
## [1] "site_number" "buff_dist"
## [3] "airp_runway" "borrowpit_dry"
## [5] "borrowpit_wet" "borrowpits"
## [7] "buffer_area" "camp_industrial"
## [9] "campground" "canal"
## [11] "cfo" "clearing_unknown"
## [13] "clearing_wellpad_unconfirmed" "conventional_seismic"
## [15] "country_residence" "crop"
## [17] "cultivation_abandoned" "dugout"
## [19] "facility_other" "facility_unknown"
## [21] "fruit_vegetables" "golfcourse"
## [23] "greenspace" "grvl_sand_pit"
## [25] "harvest_area" "harvest_area_white_zone"
## [27] "lagoon" "landfill"
## [29] "low_impact_seismic" "mill"
## [31] "mines_pitlake" "misc_oil_gas_facility"
## [33] "oil_gas_plant" "open_pit_mine"
## [35] "pipeline" "recreation"
## [37] "reservoir" "residence_clearing"
## [39] "rlwy_mlt_track" "rlwy_sgl_track"
## [41] "rlwy_spur" "road_gravel_1l"
## [43] "road_gravel_2l" "road_paved_1l"
## [45] "road_paved_2l" "road_paved_3l"
## [47] "road_paved_4l" "road_paved_div"
## [49] "road_paved_undiv_1l" "road_paved_undiv_2l"
## [51] "road_unclassified" "road_unimproved"
## [53] "road_unpaved_2l" "road_winter"
## [55] "rough_pasture" "runway"
## [57] "rural_residence" "sump"
## [59] "surrounding_veg" "tame_pasture"
## [61] "trail" "transfer_station"
## [63] "transmission_line" "truck_trail"
## [65] "urban_industrial" "urban_residence"
## [67] "vegetated_edge_railways" "vegetated_edge_roads"
## [69] "well_cleared_not_confirmed" "well_cleared_not_drilled"
## [71] "well_aband" "well_bitumen"
## [73] "well_cased" "well_gas"
## [75] "well_oil" "well_other"
## [77] "well_unknown"
names(srfn_covariate_data_fixed$VEG)
## [1] "site_number" "buff_dist" "110" "120" "20"
## [6] "210" "220" "230" "33" "34"
## [11] "50"
names(srfn_covariate_data_fixed$harvest)
## [1] "site_number" "buff_dist" "1940" "1950" "1960"
## [6] "1966" "1967" "1968" "1969" "1970"
## [11] "1971" "1972" "1973" "1974" "1975"
## [16] "1976" "1977" "1978" "1979" "1980"
## [21] "1981" "1982" "1983" "1984" "1985"
## [26] "1986" "1987" "1988" "1989" "1990"
## [31] "1991" "1992" "1993" "1994" "1995"
## [36] "1996" "1997" "1998" "1999" "2000"
## [41] "2001" "2002" "2003" "2004" "2005"
## [46] "2006" "2007" "2008" "2009" "2010"
## [51] "2011" "2012" "2013" "2014" "2015"
## [56] "2016" "2017" "2018" "2019" "2020"
## [61] "2021" "buffer_area" "feature_area"
Now we need to join the three files together
covariates_all <- srfn_covariate_data_fixed$HFI %>%
#use full join in case any issues with missing observations but we should be good since we checked the site_number names
full_join(srfn_covariate_data_fixed$VEG,
by = c('site_number', 'buff_dist')) %>%
full_join(srfn_covariate_data_fixed$harvest,
by = c('site_number', 'buff_dist'))
head(covariates_all)
## # A tibble: 6 × 147
## site_number buff_dist airp_runway borrowpit_dry borrowpit_wet borrowpits
## <fct> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 250 0 0 0 0
## 2 2 250 0 0 0 0
## 3 4 250 0 0 0 0
## 4 6 250 0 0 0 0
## 5 10 250 0 0 0 0
## 6 12 250 0 0 0 0
## # ℹ 141 more variables: buffer_area.x <dbl>, camp_industrial <dbl>,
## # campground <dbl>, canal <dbl>, cfo <dbl>, clearing_unknown <dbl>,
## # clearing_wellpad_unconfirmed <dbl>, conventional_seismic <dbl>,
## # country_residence <dbl>, crop <dbl>, cultivation_abandoned <dbl>,
## # dugout <dbl>, facility_other <dbl>, facility_unknown <dbl>,
## # fruit_vegetables <dbl>, golfcourse <dbl>, greenspace <dbl>,
## # grvl_sand_pit <dbl>, harvest_area <dbl>, harvest_area_white_zone <dbl>, …
summary(covariates_all)
## site_number buff_dist airp_runway borrowpit_dry
## 1 : 20 Min. : 250 Min. :0 Min. :0.0000000
## 2 : 20 1st Qu.:1438 1st Qu.:0 1st Qu.:0.0000000
## 4 : 20 Median :2625 Median :0 Median :0.0000000
## 6 : 20 Mean :2625 Mean :0 Mean :0.0004945
## 10 : 20 3rd Qu.:3812 3rd Qu.:0 3rd Qu.:0.0005115
## 12 : 20 Max. :5000 Max. :0 Max. :0.0296372
## (Other):1080
## borrowpit_wet borrowpits buffer_area.x camp_industrial
## Min. :0.0000000 Min. :0.000e+00 Min. : 196260 Min. :0
## 1st Qu.:0.0000000 1st Qu.:0.000e+00 1st Qu.: 6525640 1st Qu.:0
## Median :0.0000000 Median :0.000e+00 Median :21686712 Median :0
## Mean :0.0001462 Mean :4.598e-05 Mean :28163286 Mean :0
## 3rd Qu.:0.0000760 3rd Qu.:0.000e+00 3rd Qu.:45679477 3rd Qu.:0
## Max. :0.0073210 Max. :2.615e-03 Max. :78503934 Max. :0
##
## campground canal cfo clearing_unknown
## Min. :0.000e+00 Min. :0.0000000 Min. :0 Min. :0.000e+00
## 1st Qu.:0.000e+00 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0.000e+00
## Median :0.000e+00 Median :0.0000000 Median :0 Median :4.240e-07
## Mean :2.409e-06 Mean :0.0002124 Mean :0 Mean :8.076e-04
## 3rd Qu.:0.000e+00 3rd Qu.:0.0000000 3rd Qu.:0 3rd Qu.:1.124e-03
## Max. :4.967e-04 Max. :0.0076994 Max. :0 Max. :2.818e-02
##
## clearing_wellpad_unconfirmed conventional_seismic country_residence
## Min. :0.0000000 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.0000000 1st Qu.:0.001827 1st Qu.:0.000000
## Median :0.0000000 Median :0.003612 Median :0.000000
## Mean :0.0001149 Mean :0.004028 Mean :0.000439
## 3rd Qu.:0.0000000 3rd Qu.:0.005451 3rd Qu.:0.000000
## Max. :0.0027152 Max. :0.030028 Max. :0.056385
##
## crop cultivation_abandoned dugout
## Min. :0.00000 Min. :0.000000 Min. :0.000e+00
## 1st Qu.:0.00000 1st Qu.:0.000000 1st Qu.:0.000e+00
## Median :0.00000 Median :0.000000 Median :0.000e+00
## Mean :0.02988 Mean :0.001701 Mean :2.309e-05
## 3rd Qu.:0.00000 3rd Qu.:0.000000 3rd Qu.:0.000e+00
## Max. :0.43283 Max. :0.040084 Max. :1.239e-03
##
## facility_other facility_unknown fruit_vegetables golfcourse
## Min. :0.0000000 Min. :0.0000000 Min. :0 Min. :0
## 1st Qu.:0.0000000 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0
## Median :0.0000000 Median :0.0000000 Median :0 Median :0
## Mean :0.0003137 Mean :0.0000223 Mean :0 Mean :0
## 3rd Qu.:0.0000000 3rd Qu.:0.0000000 3rd Qu.:0 3rd Qu.:0
## Max. :0.0774805 Max. :0.0064178 Max. :0 Max. :0
##
## greenspace grvl_sand_pit harvest_area
## Min. :0.000e+00 Min. :0.000000 Min. :0.00000
## 1st Qu.:0.000e+00 1st Qu.:0.000000 1st Qu.:0.02588
## Median :0.000e+00 Median :0.000000 Median :0.23866
## Mean :1.424e-05 Mean :0.001116 Mean :0.23873
## 3rd Qu.:0.000e+00 3rd Qu.:0.000000 3rd Qu.:0.37536
## Max. :2.346e-03 Max. :0.416663 Max. :0.98631
##
## harvest_area_white_zone lagoon landfill low_impact_seismic
## Min. :0.00000 Min. :0.000e+00 Min. :0 Min. :0.000e+00
## 1st Qu.:0.00000 1st Qu.:0.000e+00 1st Qu.:0 1st Qu.:0.000e+00
## Median :0.00000 Median :0.000e+00 Median :0 Median :0.000e+00
## Mean :0.01302 Mean :3.106e-05 Mean :0 Mean :1.828e-05
## 3rd Qu.:0.00000 3rd Qu.:0.000e+00 3rd Qu.:0 3rd Qu.:0.000e+00
## Max. :0.80503 Max. :4.257e-03 Max. :0 Max. :6.059e-03
##
## mill mines_pitlake misc_oil_gas_facility oil_gas_plant open_pit_mine
## Min. :0 Min. :0 Min. :0.0000000 Min. :0 Min. :0
## 1st Qu.:0 1st Qu.:0 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0
## Median :0 Median :0 Median :0.0000000 Median :0 Median :0
## Mean :0 Mean :0 Mean :0.0013619 Mean :0 Mean :0
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.0007224 3rd Qu.:0 3rd Qu.:0
## Max. :0 Max. :0 Max. :0.1313891 Max. :0 Max. :0
##
## pipeline recreation reservoir residence_clearing
## Min. :0.00000 Min. :0.000e+00 Min. :0.000e+00 Min. :0.0000000
## 1st Qu.:0.00000 1st Qu.:0.000e+00 1st Qu.:0.000e+00 1st Qu.:0.0000000
## Median :0.00450 Median :0.000e+00 Median :0.000e+00 Median :0.0000000
## Mean :0.01031 Mean :6.623e-05 Mean :8.539e-05 Mean :0.0001049
## 3rd Qu.:0.01523 3rd Qu.:0.000e+00 3rd Qu.:0.000e+00 3rd Qu.:0.0000000
## Max. :0.14867 Max. :7.941e-03 Max. :1.393e-02 Max. :0.0132461
##
## rlwy_mlt_track rlwy_sgl_track rlwy_spur road_gravel_1l
## Min. :0 Min. :0.0000000 Min. :0 Min. :0.0000000
## 1st Qu.:0 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0.0006477
## Median :0 Median :0.0000000 Median :0 Median :0.0043887
## Mean :0 Mean :0.0001036 Mean :0 Mean :0.0056608
## 3rd Qu.:0 3rd Qu.:0.0000000 3rd Qu.:0 3rd Qu.:0.0088573
## Max. :0 Max. :0.0036376 Max. :0 Max. :0.0598752
##
## road_gravel_2l road_paved_1l road_paved_2l road_paved_3l
## Min. :0.000e+00 Min. :0.000e+00 Min. :0 Min. :0
## 1st Qu.:0.000e+00 1st Qu.:0.000e+00 1st Qu.:0 1st Qu.:0
## Median :0.000e+00 Median :0.000e+00 Median :0 Median :0
## Mean :3.886e-05 Mean :5.918e-06 Mean :0 Mean :0
## 3rd Qu.:0.000e+00 3rd Qu.:0.000e+00 3rd Qu.:0 3rd Qu.:0
## Max. :1.820e-03 Max. :6.158e-04 Max. :0 Max. :0
##
## road_paved_4l road_paved_div road_paved_undiv_1l road_paved_undiv_2l
## Min. :0 Min. :0 Min. :0.000e+00 Min. :0.0000000
## 1st Qu.:0 1st Qu.:0 1st Qu.:0.000e+00 1st Qu.:0.0000000
## Median :0 Median :0 Median :0.000e+00 Median :0.0000000
## Mean :0 Mean :0 Mean :7.538e-06 Mean :0.0005671
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.000e+00 3rd Qu.:0.0000000
## Max. :0 Max. :0 Max. :1.051e-03 Max. :0.0066563
##
## road_unclassified road_unimproved road_unpaved_2l road_winter
## Min. :0.0000000 Min. :0.0000000 Min. :0 Min. :0
## 1st Qu.:0.0000000 1st Qu.:0.0001997 1st Qu.:0 1st Qu.:0
## Median :0.0000000 Median :0.0009214 Median :0 Median :0
## Mean :0.0001274 Mean :0.0011036 Mean :0 Mean :0
## 3rd Qu.:0.0000000 3rd Qu.:0.0014730 3rd Qu.:0 3rd Qu.:0
## Max. :0.0145510 Max. :0.0237365 Max. :0 Max. :0
##
## rough_pasture runway rural_residence sump
## Min. :0.00000 Min. :0.0000000 Min. :0.000000 Min. :0.000e+00
## 1st Qu.:0.00000 1st Qu.:0.0000000 1st Qu.:0.000000 1st Qu.:0.000e+00
## Median :0.00000 Median :0.0000000 Median :0.000000 Median :0.000e+00
## Mean :0.01066 Mean :0.0001223 Mean :0.001884 Mean :4.982e-05
## 3rd Qu.:0.00000 3rd Qu.:0.0000000 3rd Qu.:0.000000 3rd Qu.:0.000e+00
## Max. :0.28616 Max. :0.0123446 Max. :0.091914 Max. :3.232e-03
##
## surrounding_veg tame_pasture trail transfer_station
## Min. :0.0000000 Min. :0.0000 Min. :0.0000000 Min. :0
## 1st Qu.:0.0000000 1st Qu.:0.0000 1st Qu.:0.0003476 1st Qu.:0
## Median :0.0000000 Median :0.0000 Median :0.0009790 Median :0
## Mean :0.0001282 Mean :0.0146 Mean :0.0012570 Mean :0
## 3rd Qu.:0.0000000 3rd Qu.:0.0000 3rd Qu.:0.0018804 3rd Qu.:0
## Max. :0.0346612 Max. :0.2991 Max. :0.0118693 Max. :0
##
## transmission_line truck_trail urban_industrial
## Min. :0.0000000 Min. :0.0000000 Min. :0.000e+00
## 1st Qu.:0.0000000 1st Qu.:0.0001198 1st Qu.:0.000e+00
## Median :0.0000000 Median :0.0006074 Median :0.000e+00
## Mean :0.0011787 Mean :0.0011931 Mean :2.583e-05
## 3rd Qu.:0.0003164 3rd Qu.:0.0015813 3rd Qu.:0.000e+00
## Max. :0.0460439 Max. :0.0823490 Max. :4.045e-03
##
## urban_residence vegetated_edge_railways vegetated_edge_roads
## Min. :0.0000000 Min. :0.0000000 Min. :0.000000
## 1st Qu.:0.0000000 1st Qu.:0.0000000 1st Qu.:0.003433
## Median :0.0000000 Median :0.0000000 Median :0.012251
## Mean :0.0001453 Mean :0.0001874 Mean :0.013592
## 3rd Qu.:0.0000000 3rd Qu.:0.0000000 3rd Qu.:0.020822
## Max. :0.0191791 Max. :0.0049635 Max. :0.099551
##
## well_cleared_not_confirmed well_cleared_not_drilled well_aband
## Min. :0 Min. :0 Min. :0.0000000
## 1st Qu.:0 1st Qu.:0 1st Qu.:0.0001567
## Median :0 Median :0 Median :0.0019988
## Mean :0 Mean :0 Mean :0.0031103
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.0045361
## Max. :0 Max. :0 Max. :0.0437908
##
## well_bitumen well_cased well_gas well_oil
## Min. :0 Min. :0.000e+00 Min. :0.000e+00 Min. :0.000000
## 1st Qu.:0 1st Qu.:0.000e+00 1st Qu.:0.000e+00 1st Qu.:0.000000
## Median :0 Median :0.000e+00 Median :0.000e+00 Median :0.003253
## Mean :0 Mean :7.166e-05 Mean :5.196e-05 Mean :0.005327
## 3rd Qu.:0 3rd Qu.:0.000e+00 3rd Qu.:0.000e+00 3rd Qu.:0.009214
## Max. :0 Max. :3.125e-03 Max. :1.857e-03 Max. :0.095784
##
## well_other well_unknown 110 120
## Min. :0.000000 Min. :0 Min. :0.000000 Min. :0.00000
## 1st Qu.:0.000000 1st Qu.:0 1st Qu.:0.006635 1st Qu.:0.00000
## Median :0.000000 Median :0 Median :0.034291 Median :0.00000
## Mean :0.001677 Mean :0 Mean :0.055123 Mean :0.03587
## 3rd Qu.:0.002377 3rd Qu.:0 3rd Qu.:0.068804 3rd Qu.:0.00000
## Max. :0.032438 Max. :0 Max. :0.883334 Max. :0.49000
##
## 20 210 220 230
## Min. :0.00000 Min. :0.00000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.03179 1st Qu.:0.1463 1st Qu.:0.00000
## Median :0.00000 Median :0.23137 Median :0.3010 Median :0.01965
## Mean :0.06146 Mean :0.23902 Mean :0.3502 Mean :0.04277
## 3rd Qu.:0.03622 3rd Qu.:0.38303 3rd Qu.:0.5250 3rd Qu.:0.06350
## Max. :0.84113 Max. :0.84699 Max. :1.0000 Max. :0.93137
##
## 33 34 50 1940
## Min. :0.000e+00 Min. :0.00000 Min. :0.00000 Min. :0.0000000
## 1st Qu.:0.000e+00 1st Qu.:0.01782 1st Qu.:0.04624 1st Qu.:0.0000000
## Median :0.000e+00 Median :0.05463 Median :0.10172 Median :0.0000000
## Mean :4.182e-05 Mean :0.05948 Mean :0.15602 Mean :0.0002878
## 3rd Qu.:0.000e+00 3rd Qu.:0.08856 3rd Qu.:0.20080 3rd Qu.:0.0000000
## Max. :3.641e-03 Max. :0.45140 Max. :0.93212 Max. :0.0243877
##
## 1950 1960 1966 1967
## Min. :0.000000 Min. :0.000000 Min. :0 Min. :0.0000000
## 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0 1st Qu.:0.0000000
## Median :0.000000 Median :0.000000 Median :0 Median :0.0000000
## Mean :0.005077 Mean :0.004843 Mean :0 Mean :0.0001106
## 3rd Qu.:0.000000 3rd Qu.:0.001549 3rd Qu.:0 3rd Qu.:0.0000000
## Max. :0.891286 Max. :0.125229 Max. :0 Max. :0.0150943
##
## 1968 1969 1970 1971
## Min. :0.0000000 Min. :0.000e+00 Min. :0.00000 Min. :0
## 1st Qu.:0.0000000 1st Qu.:0.000e+00 1st Qu.:0.00000 1st Qu.:0
## Median :0.0000000 Median :0.000e+00 Median :0.00000 Median :0
## Mean :0.0001209 Mean :1.067e-05 Mean :0.02033 Mean :0
## 3rd Qu.:0.0000000 3rd Qu.:0.000e+00 3rd Qu.:0.01921 3rd Qu.:0
## Max. :0.0135532 Max. :1.691e-03 Max. :0.87957 Max. :0
##
## 1972 1973 1974 1975 1976
## Min. :0 Min. :0 Min. :0 Min. :0.000e+00 Min. :0.0000000
## 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0.000e+00 1st Qu.:0.0000000
## Median :0 Median :0 Median :0 Median :0.000e+00 Median :0.0000000
## Mean :0 Mean :0 Mean :0 Mean :5.914e-06 Mean :0.0000021
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.000e+00 3rd Qu.:0.0000000
## Max. :0 Max. :0 Max. :0 Max. :1.430e-03 Max. :0.0007532
##
## 1977 1978 1979 1980 1981
## Min. :0.000e+00 Min. :0 Min. :0 Min. :0.000000 Min. :0
## 1st Qu.:0.000e+00 1st Qu.:0 1st Qu.:0 1st Qu.:0.000000 1st Qu.:0
## Median :0.000e+00 Median :0 Median :0 Median :0.000000 Median :0
## Mean :3.272e-07 Mean :0 Mean :0 Mean :0.017130 Mean :0
## 3rd Qu.:0.000e+00 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.003411 3rd Qu.:0
## Max. :2.599e-04 Max. :0 Max. :0 Max. :0.420122 Max. :0
##
## 1982 1983 1984 1985
## Min. :0 Min. :0.000000 Min. :0.000e+00 Min. :0.000000
## 1st Qu.:0 1st Qu.:0.000000 1st Qu.:0.000e+00 1st Qu.:0.000000
## Median :0 Median :0.000000 Median :0.000e+00 Median :0.000000
## Mean :0 Mean :0.000197 Mean :5.556e-05 Mean :0.001827
## 3rd Qu.:0 3rd Qu.:0.000000 3rd Qu.:0.000e+00 3rd Qu.:0.000000
## Max. :0 Max. :0.011415 Max. :7.692e-03 Max. :0.087543
##
## 1986 1987 1988 1989
## Min. :0.000000 Min. :0.0000000 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.000000 1st Qu.:0.0000000 1st Qu.:0.000000 1st Qu.:0.000000
## Median :0.000000 Median :0.0000000 Median :0.000000 Median :0.000000
## Mean :0.007432 Mean :0.0003432 Mean :0.001416 Mean :0.002745
## 3rd Qu.:0.000000 3rd Qu.:0.0000000 3rd Qu.:0.000000 3rd Qu.:0.000000
## Max. :0.197918 Max. :0.0449292 Max. :0.171834 Max. :0.173129
##
## 1990 1991 1992 1993
## Min. :0.00000 Min. :0 Min. :0 Min. :0.0000000
## 1st Qu.:0.00000 1st Qu.:0 1st Qu.:0 1st Qu.:0.0000000
## Median :0.00000 Median :0 Median :0 Median :0.0000000
## Mean :0.02199 Mean :0 Mean :0 Mean :0.0002048
## 3rd Qu.:0.00939 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.0000000
## Max. :0.84354 Max. :0 Max. :0 Max. :0.0205565
##
## 1994 1995 1996 1997
## Min. :0.0000000 Min. :0.000e+00 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.0000000 1st Qu.:0.000e+00 1st Qu.:0.000000 1st Qu.:0.000000
## Median :0.0000000 Median :0.000e+00 Median :0.000000 Median :0.000000
## Mean :0.0007679 Mean :6.971e-05 Mean :0.007337 Mean :0.001736
## 3rd Qu.:0.0000000 3rd Qu.:0.000e+00 3rd Qu.:0.000000 3rd Qu.:0.000000
## Max. :0.0779967 Max. :6.484e-03 Max. :0.788790 Max. :0.126973
##
## 1998 1999 2000 2001
## Min. :0.000000 Min. :0.0000000 Min. :0.000000 Min. :0.000e+00
## 1st Qu.:0.000000 1st Qu.:0.0000000 1st Qu.:0.000000 1st Qu.:0.000e+00
## Median :0.000000 Median :0.0000000 Median :0.000000 Median :0.000e+00
## Mean :0.001915 Mean :0.0004213 Mean :0.007223 Mean :7.072e-05
## 3rd Qu.:0.000000 3rd Qu.:0.0000000 3rd Qu.:0.000000 3rd Qu.:0.000e+00
## Max. :0.108919 Max. :0.0388934 Max. :0.393858 Max. :8.372e-03
##
## 2002 2003 2004 2005
## Min. :0 Min. :0.000000 Min. :0.0000000 Min. :0.000000
## 1st Qu.:0 1st Qu.:0.000000 1st Qu.:0.0000000 1st Qu.:0.000000
## Median :0 Median :0.000000 Median :0.0000000 Median :0.000000
## Mean :0 Mean :0.008926 Mean :0.0043836 Mean :0.002526
## 3rd Qu.:0 3rd Qu.:0.000000 3rd Qu.:0.0001052 3rd Qu.:0.000000
## Max. :0 Max. :0.280990 Max. :0.0906410 Max. :0.244374
##
## 2006 2007 2008 2009
## Min. :0.00000 Min. :0.0000000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.0000000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.0000000 Median :0.00000 Median :0.00000
## Mean :0.01975 Mean :0.0003268 Mean :0.00844 Mean :0.01501
## 3rd Qu.:0.01886 3rd Qu.:0.0000000 3rd Qu.:0.00000 3rd Qu.:0.01630
## Max. :0.42386 Max. :0.0326652 Max. :0.49764 Max. :0.37049
##
## 2010 2011 2012 2013
## Min. :0.000000 Min. :0.000000 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.000000
## Median :0.000000 Median :0.000000 Median :0.000000 Median :0.000000
## Mean :0.009539 Mean :0.008219 Mean :0.002717 Mean :0.004286
## 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.000000
## Max. :0.478107 Max. :0.237154 Max. :0.103264 Max. :0.289583
##
## 2014 2015 2016 2017
## Min. :0.000e+00 Min. :0.0000000 Min. :0.000000 Min. :0.00000
## 1st Qu.:0.000e+00 1st Qu.:0.0000000 1st Qu.:0.000000 1st Qu.:0.00000
## Median :0.000e+00 Median :0.0000000 Median :0.000000 Median :0.00000
## Mean :8.902e-05 Mean :0.0130128 Mean :0.000608 Mean :0.01409
## 3rd Qu.:0.000e+00 3rd Qu.:0.0006807 3rd Qu.:0.000000 3rd Qu.:0.01322
## Max. :4.485e-03 Max. :0.4669166 Max. :0.037603 Max. :0.19362
##
## 2018 2019 2020 2021
## Min. :0.000000 Min. :0.000000 Min. :0.00000 Min. :0.000000
## 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.000000
## Median :0.000000 Median :0.000000 Median :0.00000 Median :0.000000
## Mean :0.003066 Mean :0.006002 Mean :0.00565 Mean :0.008415
## 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.000000
## Max. :0.359185 Max. :0.186836 Max. :0.11843 Max. :0.459053
##
## buffer_area.y feature_area
## Min. : 196260 Min. :0
## 1st Qu.: 6525640 1st Qu.:0
## Median :21686712 Median :0
## Mean :28163286 Mean :0
## 3rd Qu.:45679477 3rd Qu.:0
## Max. :78503934 Max. :0
##
It’s missing the columns for site and array from the reference data, but when we merge with the detections data it will get added because the sites will match up and there are no issues with duplicate sites.
One last thing, there’s a duplicate column that we didn’t use to join the data because it repeats for each site, buffer_area which is in the harvest and HFI data sets, we won’t need if for analyses so let’s remove now to clean it up
covariates_all <- covariates_all %>%
select(!contains('buffer_area'))
I opened the data in my Rstudio viewer window to double check this worked
Let’s also save this for future use
# save joined data
write_csv(covariates_all,
'data/processed/srfn_covariates.csv')
Now that we’ve merged, cleaned, and reformatted the data we don’t need the list file or messy merged data anymore. Let’s remove these from the environment so we don’t accidentally use them.
rm(srfn_covariate_data,
srfn_covariate_data_fixed)
There are too many covariates to include in the models individually and many of them describe similar HFI features.
The covariate_table and the README file in this repository include descriptions of each feature from the ABMI human footprints wall to wall data download website for Year 2021; which can also be found in the relevant_literature folder of this repository (HFI_2021_v1_0_Metadata_Final.pdf).
As we prepare to lump the covariates together, we may need to reference the column names. Let’s print that now so we have it fresh in the console.
names(covariates_all)
## [1] "site_number" "buff_dist"
## [3] "airp_runway" "borrowpit_dry"
## [5] "borrowpit_wet" "borrowpits"
## [7] "camp_industrial" "campground"
## [9] "canal" "cfo"
## [11] "clearing_unknown" "clearing_wellpad_unconfirmed"
## [13] "conventional_seismic" "country_residence"
## [15] "crop" "cultivation_abandoned"
## [17] "dugout" "facility_other"
## [19] "facility_unknown" "fruit_vegetables"
## [21] "golfcourse" "greenspace"
## [23] "grvl_sand_pit" "harvest_area"
## [25] "harvest_area_white_zone" "lagoon"
## [27] "landfill" "low_impact_seismic"
## [29] "mill" "mines_pitlake"
## [31] "misc_oil_gas_facility" "oil_gas_plant"
## [33] "open_pit_mine" "pipeline"
## [35] "recreation" "reservoir"
## [37] "residence_clearing" "rlwy_mlt_track"
## [39] "rlwy_sgl_track" "rlwy_spur"
## [41] "road_gravel_1l" "road_gravel_2l"
## [43] "road_paved_1l" "road_paved_2l"
## [45] "road_paved_3l" "road_paved_4l"
## [47] "road_paved_div" "road_paved_undiv_1l"
## [49] "road_paved_undiv_2l" "road_unclassified"
## [51] "road_unimproved" "road_unpaved_2l"
## [53] "road_winter" "rough_pasture"
## [55] "runway" "rural_residence"
## [57] "sump" "surrounding_veg"
## [59] "tame_pasture" "trail"
## [61] "transfer_station" "transmission_line"
## [63] "truck_trail" "urban_industrial"
## [65] "urban_residence" "vegetated_edge_railways"
## [67] "vegetated_edge_roads" "well_cleared_not_confirmed"
## [69] "well_cleared_not_drilled" "well_aband"
## [71] "well_bitumen" "well_cased"
## [73] "well_gas" "well_oil"
## [75] "well_other" "well_unknown"
## [77] "110" "120"
## [79] "20" "210"
## [81] "220" "230"
## [83] "33" "34"
## [85] "50" "1940"
## [87] "1950" "1960"
## [89] "1966" "1967"
## [91] "1968" "1969"
## [93] "1970" "1971"
## [95] "1972" "1973"
## [97] "1974" "1975"
## [99] "1976" "1977"
## [101] "1978" "1979"
## [103] "1980" "1981"
## [105] "1982" "1983"
## [107] "1984" "1985"
## [109] "1986" "1987"
## [111] "1988" "1989"
## [113] "1990" "1991"
## [115] "1992" "1993"
## [117] "1994" "1995"
## [119] "1996" "1997"
## [121] "1998" "1999"
## [123] "2000" "2001"
## [125] "2002" "2003"
## [127] "2004" "2005"
## [129] "2006" "2007"
## [131] "2008" "2009"
## [133] "2010" "2011"
## [135] "2012" "2013"
## [137] "2014" "2015"
## [139] "2016" "2017"
## [141] "2018" "2019"
## [143] "2020" "2021"
## [145] "feature_area"
Quick note to check with Emerald on, none of the ris features that came up in the second extraction of the OSM data are present here
Now we will use the mutate() function with some
tidyverse trickery (i.e., nesting across() and
contains() in rowsums()) to sum across each
observation (row) by searching for various character strings. If there
isn’t a common character string for multiple variables we want to sum
then we provide each one individually. We can also combine these methods
(e.g., with ‘facilities’ [see code]).
hfi_covariates_grouped <- covariates_all %>%
# rename 'vegetated_edge_roads so that we can use road as keyword to group roads without including this feature
rename('vegetated_edge_rds' = vegetated_edge_roads) %>%
# within the mutate function create new column names for the grouped variables
mutate(
# borrowpits
borrowpits = rowSums(across(contains('borrowpit'))) + # here we use rowsums with across() and contains() to sum acrross each row any values for columns that contain the keyword above. Be careful when using that there aren't any variables that match the string (keyword) provided that you don't want to include!
dugout +
lagoon +
sump,
# non-harvest clearings
clearings = rowSums(across(contains('clearing'))) +
runway,
# cultivations
cultivation = crop +
cultivation_abandoned +
fruit_vegetables +
rough_pasture +
tame_pasture,
# harvest areas
harvest = rowSums(across(contains('harvest'))),
# industrial facilities
facilities = rowSums(across(contains('facility'))) +
rowSums(across(contains('plant'))) +
camp_industrial +
mill +
urban_industrial,
# mine areas
mines = rowSums(across(contains('mine'))) +
rowSums(across(contains('tailing'))) +
grvl_sand_pit,
# railways
railways = rowSums(across(contains('rlwy'))),
# reclaimed areas
reclaimed = rowSums(across(contains('reclaimed'))),
# recreation areas
recreation = campground +
golfcourse +
greenspace +
recreation,
# residential areas (can't use residence as keyword because 'residence_clearing' is in clearing unless we rearrange groupings or rename that one)
residential = country_residence +
rural_residence +
urban_residence,
# roads (we renamed 'vegetated_edge_roads' above to 'vegetated_edge_rds' so we can use roads as keyword here which saves a bunch of coding as there are many many road variables)
roads = rowSums(across(contains('road'))) +
airp_runway +
transfer_station,
# seismic lines
seismic_lines = conventional_seismic,
# 3D sesimic lines (put the 3D at the end though to make R happy)
seismic_lines_3D = low_impact_seismic,
# transmission lines
transmission_lines = rowSums(across(contains('transmission'))),
# trails
trails = rowSums(across(contains('trail'))),
# vegetated edges
veg_edges = rowSums(across(contains('vegetated'))) +
surrounding_veg,
# man-made water features
water = canal +
reservoir,
# well sites (this probably includes 'clearing_wellpad' need to check)
wells = rowSums(across(contains('well'))),
# we will group harvest into two 'bins' years 2000 + and pre 2000, the below code only works if the columns are ordered numerically and no columns of non-harvest data included between the necessary columns
harvest_pre2000 = rowSums(across(`1940`:`1999`)),
harvest_2000 = rowSums(across(`2000`:`2021`)),
# remove columns that were used to create new columns to tidy the data frame
.keep = 'unused') %>%
# now lets rename the landcover types which are currently just numbers and that isn't super informative
# rename landcover classes
rename(
lc_grassland = '110',
lc_coniferous = '210',
lc_broadleaf = '220',
lc_mixed = '230',
lc_developed = '34',
lc_shrub = '50',
lc_water = '20',
lc_bareground = '33',
lc_agriculture = '120') %>%
# reorder alphabetically except site_number and buff_dist
select(order(colnames(.))) %>%
# we want to move the columns that aren't HFI features or landcover to the front
relocate(.,
c(site_number,
buff_dist)) %>%
# reorder variables so the veg data is after all the HFI data
relocate(starts_with('lc_'),
.after = wells)
# see what's left
names(hfi_covariates_grouped)
## [1] "site_number" "buff_dist" "borrowpits"
## [4] "cfo" "clearings" "cultivation"
## [7] "facilities" "feature_area" "harvest"
## [10] "harvest_2000" "harvest_pre2000" "landfill"
## [13] "mines" "pipeline" "railways"
## [16] "reclaimed" "recreation" "residential"
## [19] "roads" "seismic_lines" "seismic_lines_3D"
## [22] "trails" "transmission_lines" "veg_edges"
## [25] "water" "wells" "lc_agriculture"
## [28] "lc_bareground" "lc_broadleaf" "lc_coniferous"
## [31] "lc_developed" "lc_grassland" "lc_mixed"
## [34] "lc_shrub" "lc_water"
# check the structure of new data
str(hfi_covariates_grouped)
## tibble [1,200 × 35] (S3: tbl_df/tbl/data.frame)
## $ site_number : Factor w/ 60 levels "1","2","4","6",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ buff_dist : int [1:1200] 250 250 250 250 250 250 250 250 250 250 ...
## $ borrowpits : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## $ cfo : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## $ clearings : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## $ cultivation : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## $ facilities : num [1:1200] 0 0.131 0 0 0 ...
## $ feature_area : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## $ harvest : num [1:1200] 0.432 0.342 0 0.388 0.424 ...
## $ harvest_2000 : num [1:1200] 0.355 0 0 0.179 0.424 ...
## $ harvest_pre2000 : num [1:1200] 0.0763 0.3418 0 0.209 0 ...
## $ landfill : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## $ mines : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## $ pipeline : num [1:1200] 0 0.148 0.0148 0 0 ...
## $ railways : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## $ reclaimed : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## $ recreation : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## $ residential : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## $ roads : num [1:1200] 0.00 5.99e-02 7.05e-03 7.11e-06 6.75e-03 ...
## $ seismic_lines : num [1:1200] 0.00 5.41e-05 0.00 0.00 0.00 ...
## $ seismic_lines_3D : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## $ trails : num [1:1200] 0 0 0.011 0 0 ...
## $ transmission_lines: num [1:1200] 0 0 0 0 0 ...
## $ veg_edges : num [1:1200] 0 0.09955 0.0129 0.00112 0.01425 ...
## $ water : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## $ wells : num [1:1200] 0 0 0.0183 0.0318 0.0332 ...
## $ lc_agriculture : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## $ lc_bareground : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## $ lc_broadleaf : num [1:1200] 0 0.18 0 0 0 ...
## $ lc_coniferous : num [1:1200] 0.847 0 0.743 0.442 0.284 ...
## $ lc_developed : num [1:1200] 0 0.4514 0.0716 0.00837 0.04522 ...
## $ lc_grassland : num [1:1200] 0 0.3608 0.0618 0 0 ...
## $ lc_mixed : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## $ lc_shrub : num [1:1200] 0.15301 0.00776 0.12401 0.54941 0.6703 ...
## $ lc_water : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
# check summary of new data
summary(hfi_covariates_grouped)
## site_number buff_dist borrowpits cfo
## 1 : 20 Min. : 250 Min. :0.0000000 Min. :0
## 2 : 20 1st Qu.:1438 1st Qu.:0.0000000 1st Qu.:0
## 4 : 20 Median :2625 Median :0.0002555 Median :0
## 6 : 20 Mean :2625 Mean :0.0007907 Mean :0
## 10 : 20 3rd Qu.:3812 3rd Qu.:0.0009892 3rd Qu.:0
## 12 : 20 Max. :5000 Max. :0.0296372 Max. :0
## (Other):1080
## clearings cultivation facilities feature_area
## Min. :0.0000000 Min. :0.00000 Min. :0.000000 Min. :0
## 1st Qu.:0.0000000 1st Qu.:0.00000 1st Qu.:0.000000 1st Qu.:0
## Median :0.0001076 Median :0.00000 Median :0.000000 Median :0
## Mean :0.0011496 Mean :0.05684 Mean :0.001724 Mean :0
## 3rd Qu.:0.0016142 3rd Qu.:0.00000 3rd Qu.:0.001291 3rd Qu.:0
## Max. :0.0281760 Max. :0.62457 Max. :0.131389 Max. :0
##
## harvest harvest_2000 harvest_pre2000 landfill
## Min. :0.0000 Min. :0.000000 Min. :0.00000 Min. :0
## 1st Qu.:0.0879 1st Qu.:0.008954 1st Qu.:0.00000 1st Qu.:0
## Median :0.2466 Median :0.133834 Median :0.05305 Median :0
## Mean :0.2517 Mean :0.142348 Mean :0.09638 Mean :0
## 3rd Qu.:0.3814 3rd Qu.:0.221274 3rd Qu.:0.14270 3rd Qu.:0
## Max. :0.9863 Max. :0.856826 Max. :0.98631 Max. :0
##
## mines pipeline railways reclaimed
## Min. :0.000000 Min. :0.00000 Min. :0.0000000 Min. :0
## 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.0000000 1st Qu.:0
## Median :0.000000 Median :0.00450 Median :0.0000000 Median :0
## Mean :0.001116 Mean :0.01031 Mean :0.0001036 Mean :0
## 3rd Qu.:0.000000 3rd Qu.:0.01523 3rd Qu.:0.0000000 3rd Qu.:0
## Max. :0.416663 Max. :0.14867 Max. :0.0036376 Max. :0
##
## recreation residential roads seismic_lines
## Min. :0.000e+00 Min. :0.000000 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.000e+00 1st Qu.:0.000000 1st Qu.:0.002420 1st Qu.:0.001827
## Median :0.000e+00 Median :0.000000 Median :0.007065 Median :0.003612
## Mean :8.288e-05 Mean :0.002469 Mean :0.007511 Mean :0.004028
## 3rd Qu.:0.000e+00 3rd Qu.:0.000000 3rd Qu.:0.011097 3rd Qu.:0.005451
## Max. :8.322e-03 Max. :0.091914 Max. :0.059875 Max. :0.030028
##
## seismic_lines_3D trails transmission_lines veg_edges
## Min. :0.000e+00 Min. :0.000000 Min. :0.0000000 Min. :0.000000
## 1st Qu.:0.000e+00 1st Qu.:0.001088 1st Qu.:0.0000000 1st Qu.:0.003484
## Median :0.000e+00 Median :0.002230 Median :0.0000000 Median :0.012634
## Mean :1.828e-05 Mean :0.002450 Mean :0.0011787 Mean :0.013908
## 3rd Qu.:0.000e+00 3rd Qu.:0.003230 3rd Qu.:0.0003164 3rd Qu.:0.021155
## Max. :6.059e-03 Max. :0.082349 Max. :0.0460439 Max. :0.099551
##
## water wells lc_agriculture lc_bareground
## Min. :0.0000000 Min. :0.0000000 Min. :0.00000 Min. :0.000e+00
## 1st Qu.:0.0000000 1st Qu.:0.0008336 1st Qu.:0.00000 1st Qu.:0.000e+00
## Median :0.0000000 Median :0.0089454 Median :0.00000 Median :0.000e+00
## Mean :0.0002978 Mean :0.0103535 Mean :0.03587 Mean :4.182e-05
## 3rd Qu.:0.0000000 3rd Qu.:0.0175866 3rd Qu.:0.00000 3rd Qu.:0.000e+00
## Max. :0.0139309 Max. :0.0957837 Max. :0.49000 Max. :3.641e-03
##
## lc_broadleaf lc_coniferous lc_developed lc_grassland
## Min. :0.0000 Min. :0.00000 Min. :0.00000 Min. :0.000000
## 1st Qu.:0.1463 1st Qu.:0.03179 1st Qu.:0.01782 1st Qu.:0.006635
## Median :0.3010 Median :0.23137 Median :0.05463 Median :0.034291
## Mean :0.3502 Mean :0.23902 Mean :0.05948 Mean :0.055123
## 3rd Qu.:0.5250 3rd Qu.:0.38303 3rd Qu.:0.08856 3rd Qu.:0.068804
## Max. :1.0000 Max. :0.84699 Max. :0.45140 Max. :0.883334
##
## lc_mixed lc_shrub lc_water
## Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.04624 1st Qu.:0.00000
## Median :0.01965 Median :0.10172 Median :0.00000
## Mean :0.04277 Mean :0.15602 Mean :0.06146
## 3rd Qu.:0.06350 3rd Qu.:0.20080 3rd Qu.:0.03622
## Max. :0.93137 Max. :0.93212 Max. :0.84113
##
Okay this gives us a smaller data set to work with but I think we can clean it up further based on the summaries here there are several features we don’t have a lot of data for, we can remove any with all zeros here and check the others visually with some histograms of the data
hfi_covariates_grouped <- hfi_covariates_grouped %>%
select(where(~ !all(. == 0)))
Let’s look at the histograms again and see if we need to remove any features or feature groups without enough data; I’m not worrying about the years of harevst data yet
# Define the starting column and get all column names from that point
start_col <- 'borrowpits'
columns_to_plot <- names(hfi_covariates_grouped)[which(names(hfi_covariates_grouped) == start_col):ncol(hfi_covariates_grouped)]
# Loop over the selected columns and create histograms
for (col in columns_to_plot) {
hist(hfi_covariates_grouped[[col]], main = col, xlab = col)
}
> IMO we don’t have enough variation in data to use the following
features/feature groups
Also, there’s not a lot of data for the following features, which are similar and of interest to OSM, so in the past they’ve been grouped together and we will here as well
For this analysis we will also combine these
So let’s modify this data and remove those features for now this step will need to be changed each year likely
hfi_covariates_grouped_2 <- hfi_covariates_grouped %>%
# create column industrial
mutate(
industrial = borrowpits +
clearings +
facilities +
mines,
# remove columns we used to make this variable
.keep = 'unused') %>%
# remove other features we don't need
select(!c(cultivation,
recreation,
residential,
seismic_lines_3D,
trails,
transmission_lines,
water,
railways,
lc_bareground,
lc_water)) %>%
# order again
# reorder alphabetically except site_number and buff_dist
select(order(colnames(.))) %>%
# we want to move the columns that aren't HFI features or landcover to the front
relocate(.,
c(site_number,
buff_dist)) %>%
# reorder variables so the veg data is after all the HFI data
relocate(starts_with('lc_'),
.after = wells)
# check that it worked
names(hfi_covariates_grouped_2)
## [1] "site_number" "buff_dist" "harvest" "harvest_2000"
## [5] "harvest_pre2000" "industrial" "pipeline" "roads"
## [9] "seismic_lines" "veg_edges" "wells" "lc_agriculture"
## [13] "lc_broadleaf" "lc_coniferous" "lc_developed" "lc_grassland"
## [17] "lc_mixed" "lc_shrub"
Let’s look at the histograms again
# Define the starting column and get all column names from that point
start_col <- 'harvest'
columns_to_plot <- names(hfi_covariates_grouped_2)[which(names(hfi_covariates_grouped_2) == start_col):ncol(hfi_covariates_grouped_2)]
# Loop over the selected columns and create histograms
for (col in columns_to_plot) {
hist(hfi_covariates_grouped_2[[col]], main = col, xlab = col)
}
hfi_covariates_grouped_2 <- hfi_covariates_grouped_2 %>%
select(!c(industrial))
Let’s remove the data frames we no longer need.
rm(covariates_all,
covariates_fixed,
covariates_grouped)
## Warning in rm(covariates_all, covariates_fixed, covariates_grouped): object
## 'covariates_fixed' not found
## Warning in rm(covariates_all, covariates_fixed, covariates_grouped): object
## 'covariates_grouped' not found
Now we need to add a column with the full site name from the reference data so this data can easily be joined with the detection data later
First let’s read in the reference data
sites <- read_csv('data/raw/reference.csv',
# specify column types
col_types = cols(.default = col_factor())) %>%
# I don't like the original column names I think they are confusing so I'm quick going to change them here
rename(site_number = site,
site = real_site)
Now let’s join them and be done with this!
covariates_final <- hfi_covariates_grouped_2 %>%
# join
left_join(sites,
by = 'site_number') %>%
# relocate site to front
relocate(site,
.after = site_number)
Let’s save this data now that it’s all formatted and grouped.
write_csv(covariates_final,
'data/processed/srfn_covariates_grouped.csv')
We are done with this script for now, we have a nice clean data set with the HFI and harvest covariates grouped how we could use them in an analysis and the VEG covariates renamed so we don’t have to memorize or lookup what the numbers mean